Skip to content

Prevent empty grouping sets from being eliminated on empty input#22039

Open
xiedeyantu wants to merge 2 commits intoapache:mainfrom
xiedeyantu:empty-groupset
Open

Prevent empty grouping sets from being eliminated on empty input#22039
xiedeyantu wants to merge 2 commits intoapache:mainfrom
xiedeyantu:empty-groupset

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

@xiedeyantu xiedeyantu commented May 6, 2026

Which issue does this PR close?

Rationale for this change

GROUPING SETS (()) should return one row even when the input is empty, because the empty grouping set represents a global aggregate. In the current behavior, queries such as WHERE false GROUP BY GROUPING SETS(()) can incorrectly be optimized into an empty result, returning 0 rows instead of the expected 1 row. This change fixes that SQL semantics bug.

What changes are included in this PR?

This PR makes two related fixes:

  1. It updates the empty-relation propagation rule so aggregates with an empty grouping set are not eliminated.
  2. It updates the grouped hash aggregate execution path so empty grouping sets are initialized correctly and still produce output on empty input.

It also adds regression coverage in grouping.slt for:

  • GROUPING SETS (()) on empty input
  • COUNT(*) on empty input
  • mixed grouping sets where only the empty grouping set should produce a row

Are these changes tested?

Yes. I ran:

  • cargo test -p datafusion-sqllogictest -- grouping
  • cargo test -p datafusion-optimizer -- propagate_empty
  • cargo clippy --tests --all-features -p datafusion-physical-plan -p datafusion-optimizer -- -D warnings
  • cargo fmt --check

Are there any user-facing changes?

Yes. Queries using GROUPING SETS (()), and other grouping-set forms that include the empty set such as ROLLUP and CUBE, will now return the correct row on empty input instead of incorrectly returning no rows.

@github-actions github-actions Bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels May 6, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiedeyantu
Thanks for working on this.
Everything looks good overall. I just had one small suggestion around extending the test coverage a bit further.

----
55

# grouping_sets_empty_input: GROUPING SETS (()) must produce one NULL row on empty input
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition to cover explicit GROUPING SETS(()). Since the optimizer change also protects ROLLUP and CUBE through has_empty_grouping_set, it would be great to add small empty-input regression cases for ROLLUP(v1) and CUBE(v1) as well. That would help lock in the full invariant covered by this change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review! I added tests for CUBE and ROLLUP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GROUPING SETS (()) returns no rows on empty input

2 participants